Using etcd for Distributed Locking
Learn to use etcd for distributed locking in Python.
We'll cover the following
Introduction to etcd#
etcd is a popular distributed key/value store. It stores keys and values and replicates them on several nodes which can be used to query or update the keys. To keep its members in sync, etcd implements the Raft algorithm. Raft solves the consensus issue in distributed systems by electing a leader who is responsible for maintaining consistency across nodes. In short, Raft makes sure that data stored by etcd are consistent across all nodes, and that if a node crashes, the etcd cluster can continue to operate until its failed node comes back to life.
In that regard, etcd allows implementing a distributed lock. The basic idea behind the lock algorithm is to write some value in a predetermined key. All the services in the cluster would pick, for example, the key lock1 and would try to write the value acquired in it. As etcd supports transactions, this operation would be executed in a transaction that would fail if the key already existed. In that case, the lock would be unable to be acquired. If it succeeds, the lock is acquired by the process that managed to create the key and write the value.
To release a lock, the client that acquired it just has to delete the key from etcd.
etcd is able to notify clients when a key is modified or deleted. Therefore, any client that was unable to create the key (and acquire the lock) can be notified as soon as the lock is released. Then it can try to acquire it.
If the client that acquired the lock crashes, the lock becomes impossible to release. To avoid such cases, the keys have a time-to-live predefined at their creation and need to be refreshed as long as the lock is kept acquired by the client.
This workflow is a basic locking algorithm and it is implementable with many other key/value stores.
Rather than implementing this algorithm from scratch (I’ll leave that to you as an exercise), leveraging the etcd3.locks.Lock class provided by the Python etcd3 package is easier and safer, and therefore, preferred.
The next executable example in this lesson shows how to get a lock from your local etcd server. As long as the lock is acquired, no other process can grab it. Since etcd is a network service, you can easily synchronize your process across a network using this simple lock mechanism.
Note: You need to start etcd to make examples work. Simply run
etcdfrom the command line in a separate terminal (open new terminal using+button beside Terminal tabs) before running each example as instructed.
After running etcd in a separate terminal you will see output similar to the following:
etcd acquire lock#
The following example shows two different methods to acquire the lock. You can use lock.acquire() to get the lock and you can also use it with the with statement, which makes it more readable and handles exceptions in an easier way.
To run the following example, click Run and then use the command
python etcd3-lock-acquire.pyafter running etcd in a separate terminal.
/
For more robustness, deploying etcd as a cluster of several nodes makes sure that if the etcd server that your application connects to goes down, the rest of the cluster can continue to work, and likewise your clients, as long as they switch to a different server when an error occurs (though this feature is not implemented in python-etcd3 yet).
Locking service with etcd and cotyledon#
The next example implements a distributed service using the Cotyledon library. It spawns four different processes, and only one is authorized to print at any given time.
To run the following example, click Run and then use the command
python etcd3-lock-service.pyafter running etcd in a separate terminal.
/
You can run this program any number of times on any number of machines on your network and you can be sure that one and only one process at a time will own this lock and be able to print its line. Since the lock is acquired for a tiny amount of time (a print operation and one second), we do not expect it to timeout. The default time-to-live for a lock is 60 seconds, which ought to be enough. If the program takes longer to print something and sleep one second, then something is wrong, and it might be better to let the lock expire.
However, for sustained operations, an application should not cheat and extend
the time-to-live value. The program should keep the lock active by regularly calling
the Lock.refresh method.
Combining such a distributed lock mechanism and a library like Cotyledon can make building a distributed service straightforward.
Processes Locks: Inter-Processes Locks
Using Tooz Locking Abstraction